12 research outputs found
Efficient supervised and semi-supervised approaches for affliations disambiguation
International audienceThe disambiguation of named entities is a challenge in many elds such as sciento- metrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organization is di cult, a single name can appear in di erent forms. This paper proposes two approaches to disambiguate on the a liations of authors of sci- enti c papers in bibliographic databases: the rst way, considers that we have a training corpus, and uses a Naive Bayesian model. The second way assumes that we have not re- source learning, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and are already partially applied in a scienti c survey department. However, we aware that our approach may have limitations: we can't process e ciently highly unbalanced data but solutions are possible for future developments
Efficient supervised and semi-supervised approaches for affiliations disambiguation
International audienceThe disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization
Approche semi-supervisée pour la désambiguïsation des affiliations dans les bases de données bibliographiques
International audienc
CorHAL une voie pour les chercheurs : simplifier le dépÎt des publications pour accroßtre le taux de texte intégral dans HAL
International audienceLancĂ© au printemps 2021 et soutenu par le MESRI, corHAL proposera ses services Ă la fin de lâannĂ©e. PortĂ© par lâInist et le CCSD, ce projet permet de collecter des mĂ©tadonnĂ©es de publications scientifiques françaises issues de plusieurs rĂ©servoirs. Ces donnĂ©es sont homogĂ©nĂ©isĂ©es et enrichies Ă lâaide dâalignements. Un repĂ©rage de doublons assure la crĂ©ation de notices unifiĂ©es combinant les informations des diffĂ©rentes sources. GrĂące Ă un systĂšme dâalertes (mode push ou pull), le service propose au chercheur ses publications absentes de HAL. Ce dernier choisit dâimporter automatiquement aucun, un, plusieurs ou tous les textes intĂ©graux de ses publications dans lâarchive ouverte nationale.CorHAL, un outil au service du chercheur et de la science ouverte
La pratique documentaire des chercheurs en SHS : la recherche d'information
Cette synthÚse s'intéresse d'abord à ce que recherchent les chercheurs en SHS (à partir de quels documents travaillent-ils ?) avant d'aborder la question de leurs sources d'information et de la maniÚre dont ils effectuent leur recherche. Une partie est consacrée aux difficultés rencontrées dans leur chasse à l'information et aux reproches énoncés en ce qui concerne l'environnement
électronique. La derniÚre partie, quant à elle, propose quelques pistes de réflexion et
recommandations.21 page
Is a quantitative risk assessment of air quality in underground parking garages possible ?
Little information is available about the health risks associated with time spent in underground parking garages. The objective of this study was to determine whether it is possible to quantify the health risks associated with these garages without epidemiologic data on the subject. We followed the standard procedure for health risk assessment. We searched the literature for pollutant concentrations in the air samples of underground parking garages, the hazards associated with their inhalation, and their toxicological reference values. Conditions of occupational and user exposure were estimated by scenarios and taken into account to discuss toxicological reference values by modifying (with Haber's law) the adjustment factors for exposure frequency and duration. Risk quantification was possible for 39 pollutants. Acute exposures to CO and NO2 exceed toxicological reference values, as does chronic exposure to benzene for threshold effects. The risk of a carcinogenic effect associated with benzene may be greater than 10(-5). Excess exposure to air pollution indicators (PM and NO2) is also elevated, judging by the WHO Air Quality Guidelines, and also when comparing to levels with reported effects in epidemiologic studies. The risk associated with underground parking garages can be evaluated only in part. The information available is nonetheless sufficient to justify actions to reduce exposur
Exposition aux poussiÚres sédimentées dans les environnements intérieurs
OSLâAgence nationale de sĂ©curitĂ© sanitaire de lâalimentation, de lâenvironnement et du travail endosse les conclusions et recommandations de lâexpertise collective relative Ă la caractĂ©risation de lâexposition de la population gĂ©nĂ©rale aux substances chimiques prĂ©sentes dans la poussiĂšre dĂ©posĂ©e sur les surfaces intĂ©rieures. Ces travaux dâexpertise confirment la pertinence de considĂ©rer lâingestion de poussiĂšres sĂ©dimentĂ©es en environnements intĂ©rieurs dans lâĂ©valuation de lâexposition globale de la population aux substances chimiques non volatiles et semi-volatiles. Concernant le prĂ©lĂšvement de la poussiĂšre, lâAnses recommande lâharmonisation des mĂ©thodes entre laboratoires en utilisant lâaspiration pour le prĂ©lĂšvement de la poussiĂšre suivie dâun tamisage Ă 250 ”m pour la taille des particules qui vont ĂȘtre analysĂ©es. Cette recommandation ne sâapplique pas au cas particulier du plomb, pour lequel il existe une norme spĂ©cifique qui prĂ©conise le prĂ©lĂšvement par lingette. LâAgence encourage par ailleurs la conduite dâĂ©tudes en France afin de disposer dâestimations robustes et reprĂ©sentatives, dans le contexte français, de paramĂštres particuliĂšrement impactants dans les calculs de doses dâexposition via lâingestion de poussiĂšres. Il sâagit en particulier de lâempoussiĂšrement en intĂ©rieur, correspondant Ă la quantitĂ© de poussiĂšre par unitĂ© de surface, et des quantitĂ©s de poussiĂšre ingĂ©rĂ©es par jour et par tranche dâĂąges. Enfin, concernant lâĂ©laboration de valeurs guides pour les poussiĂšres intĂ©rieures (VGPI) relatives Ă des substances pour lesquelles lâingestion de poussiĂšre peut ĂȘtre non nĂ©gligeable pour une partie de la population, lâAgence va poursuivre des travaux dâexpertise et se propose de les associer Ă sa mission pĂ©renne sur lâĂ©laboration de valeurs guides de qualitĂ© dâair intĂ©rieur (VGAI). A court terme, une rĂ©flexion sur la mĂ©thode dâĂ©laboration sera donc lancĂ©e pour ensuite lâappliquer Ă des substances dâintĂ©rĂȘt. Compte tenu dâexpertises antĂ©reures et de la littĂ©rature scientifique passĂ©e en revue, le plomb et les phtalates apparaissent ĂȘtre des substances prioritaires Ă investiguer dans ce cadre